Zipf's law and the structure and evolution of languages
نویسندگان
چکیده
By using a vast number of examples in social and economical data including natural languages, George Zipf was able to show an amazingly robust functional form of the rank-frequency plots 11, f 1=r f for frequency, r for rank, now commonly called Zipf's curve or Zipf's law. George Miller, a renowned linguist, summarized this study in 1965: Faced with this massive statistical regularity, you have two alternatives. Either you can assume that it reeects some universal property o f human mind, or you can assume that it represents some necessary consequence of the laws of probabilities. Zipf chose the synthetic hypothesis and searched for a principle of least eeort that would explain the apparent equilibrium between uniformity and diversity in our use of words. Most others who were subsequently attracted to the problems chose the analytic hypothesis and searched for a probabilistic explanation. Now, thirty y ears later, it seems clear that the others were right. Zipf's curves are merely one way to express a necessary consequence of regarding a message source as a stochastic process" 2. Miller based his comment on the work of Mandelbrot 3 who rst proved the Zipf's law i n monkey-typing" texts by minimizing the average cost of transmitting information. Later, Miller showed that the least-cost part is not really necessary in the proof 44 the cause of the Zipf's law in random texts is purely statistical. Here was how he summarized it: It seems, therefore, that Zipf's rule can be derived from simple assumptions that do not strain one's credulity unless the random placement of spaces seems incredible, without appeal to least eeort, least cost, maximal information, or any other branch of the calculus of variations. The rule is a simple consequence of those intermittent silences which w e imagine to exist between successive w ords" 44. Another thirty y ears have passed. Do we h a ve a n y new evidence that Zipf's explanation of the Zipf's law in natural language by least-eeort principle is more correct than a statistical explanation? Zipf's explanation was reiterated in a Commentary by Tsonis, Schultz and Tsonis TST 55. TST argues that random texts are not relevant to natural languages for the purpose of proving Zipf's law because: 1 all combinations of letters are considered as possible words in random texts but not for natural languages; 2 the frequency of occurrence of a word is a …
منابع مشابه
Comments on "linguistic features in eukaryotic genomes"
Tsonis and Tsonis [1] study rank-ordered distributions of the number of occurrences of protein domains in four different organisms, and they argue that the power-law decay, f ϰ 1/r, of the number f of occurrences of a protein domain with its rank r suggests the presence of linguistic features in eukaryotic genomes, and that this finding " may lead to important clues about the evolution of langu...
متن کاملEmergence of Zipf's Law in the Evolution of Communication
Zipf's law seems to be ubiquitous in human languages and appears to be a universal property of complex communicating systems. Following the early proposal made by Zipf concerning the presence of a tension between the efforts of speaker and hearer in a communication system, we introduce evolution by means of a variational approach to the problem based on Kullback's Minimum Discrimination of Info...
متن کاملLeast effort and the origins of scaling in human language.
The emergence of a complex language is one of the fundamental events of human evolution, and several remarkable features suggest the presence of fundamental principles of organization. These principles seem to be common to all languages. The best known is the so-called Zipf's law, which states that the frequency of a word decays as a (universal) power law of its rank. The possible origins of th...
متن کاملSyllable structure in Old, Middle and Modern Persian: A contrastive analysis
Evolution of languages has always been of interest to linguists. In this paper we study the natural progress of the syllable structure from Old Persian (O.P) to Middle Persian (Mi.P) and up to the Modern Persian (Mo.P). For this purpose all the words containing consonant sequences are collected from specific sources of each of these languages, and then analysed according to the syllab...
متن کاملComments to "Bell Curves and Monkey Languages", J. Casti, Complexity, 1, 12-15 1995.
Whether there are universal laws or principles in complex systems is a fascinating and important question. Prof. John Casti uses the case of Normal Distribution (\bell curves") to illustrate that such universal principle is perhaps out there waiting to be discovered [1]. He suggests Zipf's law as a candidate for such universal principle. But as the author of one of the three publications to pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Complexity
دوره 2 شماره
صفحات -
تاریخ انتشار 1997